Searching Massive Data Streams Using Multipattern Regular Expressions
نویسندگان
چکیده
This paper describes the design and implementation of lightgrep, a multipattern regular expression search tool that efficiently searches massive data streams. lightgrep addresses several shortcomings of existing digital forensic tools by taking advantage of recent developments in automata theory. The tool directly simulates a nondeterministic finite automaton, and incorporates a number of practical optimizations related to searching with large pattern sets.
منابع مشابه
Multiple Pattern Matching Algorithms on Collage System
Compressed pattern matching is one of the most active topics in string matching. The goal is to find all occurrences of a pattern in a compressed text without decompression. Various algorithms have been proposed depending on underlying compression methods in the last decade. Although some algorithms for multipattern searching on compressed text were also presented very recently, all of them are...
متن کاملFine Classification & Recognition of Hand Written Devnagari Characters with Regular Expressions & Minimum Edit Distance Method
Regular expressions are extremely useful, because they allow us to work with text in terms of patterns. They are considered the most sophisticated means of performing operations such as string searching, manipulation, validation, and formatting in all applications that deal with text data. Character recognition problem scenarios in sequence analysis that are ideally suited for the application o...
متن کاملA Dynamically Reconfigurable FPGA-Based Pattern Matching Hardware for Subclasses of Regular Expressions
In this paper, we propose a novel architecture for largescale regular expression matching, called dynamically reconfigurable bitparallel NFA architecture (Dynamic BP-NFA), which allows dynamic loading of regular expressions on-the-fly as well as efficient pattern matching for fast data streams. This is the first dynamically reconfigurable hardware with guaranteed performance for the class of ex...
متن کاملTop-k Pattern Matching Using an Information-Theoretic Criterion over Probabilistic Data Streams
As the development of data mining technologies for sensor data streams, more sophisticated methods for complex event processing are demanded. In the case of event recognition, since event recognition results may contain errors, we need to deal with the uncertainty of events. We therefore consider probabilistic event data streams with occurrence probabilities of events, and develop a pattern mat...
متن کاملString Pattern Matching Fora Deluge Survival
String Pattern Matching concerns itself with algorithmic and combi-natorial issues related to matching and searching on linearly arranged sequences of symbols, arguably the simplest possible discrete structures. As unprecedented volumes of sequence data are amassed, disseminated and shared at an increasing pace, eeective access to, and manipulation of such data depend crucially on the eeciency ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011